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SEQUENTIAL  ANALYSIS  OF  THE  PROPORTIONAL  HAZARDS  MODEL 

T.  Sellke  and  D.  Siegmund 
Stanford  University 


1.  Introduction 

The  proportional  hazards  model  of  survival  analysis  and  its  analysis 
by  the  method  of  partial  likelihood  originate  in  the  K>rk  of  Cox  (1972, 
1975),  who  argued  that  under  general  conditions  maximum  partial  likelihood 
estimators  have  asymptotically  normal  distributions  very  similar  to  the 
asymptotic  distributions  of  ordinary  maximum  likelihood  estimators.  Since 
then  a  number  of  authors  have  given  more  systematic  discussions  of  central 
limit  results  for  survival  analysis.  See  Gill  (1980),  Tsiatis  (1981a), 
and  Andersen  and  Gill  (1981) . 

In  this  paper  we  are  concerned  with  related  questions  in  the  context 
of  controlled  clinical  trials  which  possess  the  following  two  important 
features:  (a)  entry  into  the  trial  occurs  at  different  times  for  different 
patients  and  (b)  it  seems  desirable  to  observe  the  data  sequentially  so 
that  the  trial  may  be  terminated  at  the  earliest  possible  moment,  if  large 
treatment  effects  appear  to  be  present.  The  authors  cited  above  use  as 
their  starting  point  Cox's  observation  that  the  derivative  of  the  log 
partial  likelihood  is  a  martingale,  to  which  an  appropriate  martingale 
central  limit  theorem  may  be  applied.  However,  with  sequential  observa¬ 
tion  and  staggered  entry,  this  process  is  no  longer  in  general  a  martin¬ 
gale,  and  the  approach  breaks  down.  We  shall  show  that  it  can  be  approxi¬ 
mated  by  a  martingale  uniformly  in  time,  in  order  to  conclude  tljat  the 
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process  of  maximum  partial  likelihood  estimates  observed  in  a  certain  time 
scale  behaves  like  a  Brownian  action  process. 

Previous  work  on  this  problea  seems  to  be  linited  to  a  Monte  Carlo 
study  by  Gail,  DeMets,  and  Slud  (1981),  the  paper  of  Tsiatis  (1981b), 
and  a  recent  manuscript  of  Slud  (1982).  Although  Slud  is  concerned  with 
the  special  case  of  testing  a  simple  null  hypothesis,  there  is  some  over¬ 
lap  with  our  work,  which  we  discuss  later. 

The  results  of  this  paper  are  not  unexpected.  However,  it  is  quite 
surprising  to  find  their  complete  justification  to  be  so  difficult,  parti¬ 
cularly  in  comparison  to  proofs  of  the  superficially  similar  results  of 
the  authors  cited  above.  In  this  regard  it  is  interesting  to  note  that 
Jones  and  Whitehead  (1979)  are  willing  to  accept  a  cursory  justification 
for  related  results,  which  they  regard  as  almost  obvious  and  propose  to 
use  as  a  basis  for  certain  sequential  tests.  For  the  somewhat  related 
Gehan  test  they  offer  a  similar  informal  argument,  but  according  to  Slud 
and  Wei  (1982),  their  conclusion  is  incorrect  in  this  case.  Our  methods 
do  not  provide  satisfactory  results  concerning  the  joint  distribution  of 
the  Gehan  statistic,  which  seems  to  involve  a  substantially  more  difficult 
problem.  The  multivariate  case  is  also  more  difficult  -  even  in  its  formu¬ 
lation  -  and  our  results  here  are  not  yet  complete. 

2.  Notation  and  Formulation  of  the  Problem 

Assume  that  n  patients  enter  a  clinical  trial  at  times  y^,  y2,  ..., 
yn,  which  may  be  nonrandom  or  occur  according  to  an  arbitrary  point  process. 
Associated  with  the  ith  patient  is  a  triple  (z^  x^,  c^),  where  z^  is  a 
covariate,  x^  is  the  survival  time  following  entry  into  the  trial,  and  c^ 
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(possibly  infinite)  is  s  censoring  variable.  Thus  the  ith  person  is  on 
test  until  time  y?  ♦  XjAc...  If  £  c^,  he  dies  while  on  the  test, 
and  the  value  of  x^  is  recorded.  Otherwise  that  observation  is  censored 
and  it  is  known  only  that  >  c^.  At  any  tine  t  there  is  in  effect  a 
second  censoring  variable  (t-y^)+  in  the  sense  that  the  tine  on  test  of 
patient  i  prior  to  t  is  x^Ac-aCt-y^)*.  We  shall  refer  to  Xj,  Cj,  and 
(t-yj)*  as  "age”  variables  -  the  age  of  the  i**1  patient  at  death,  at  cen¬ 
soring,  and  at  time  t  respectively. 

Our  basic  stochastic  assumptions  are  that  (z^  x^,  c^),  i*l,  2,  ....  n 
are  independent  and  identically  distributed  and  are  independent  of  the  arri¬ 
vals  y . ..,  y  .  We  assume  also  that  the  are  uniformly  bounded  and 
that  given  x^  and  c^  are  conditionally  independent  with  x^  having 
a  cumulative  hazard  function  of  the  form 

02, 

(1)  dAjts)  -  e  x  X(s)ds 

for  some  unknown  parameter  0  and  baseline  hazard  function  X. 

For  some  results  z^  can  be  a  vector;  then  0  is  also,  and  0z^  denotes 
the  scalar  product.  For  simplicity  of  presentation  we  consider  explicitly 
only  the  scalar  case. 

All  probabilities  and  expectations  should  be  considered  as  conditional, 
given  y^,  yj,  ...,  yn* 

It  is  convenient  to  introduce  the  notation 


(2) 


■  hri*x£ t,  XjiCj,  xt«}  (>  i 


t) 


to  indicate  that  the  ith  patient  arrived  and  died  before  tine  t,  that  ha 


was  uncensored  and  was  of  age  <  s  at  the  tine  of  death.  We  also  define 
the  set  of  patients  at  risk  at  time  t  and  age  s  by 


(3)  R(t,s)  ■  U:  <  t-s,  XjA^  ^  s)  (s  <  t)  . 

With  this  notation  Cox's  (1975)  log  partial  likelihood  for  8  can  be 
expressed  by 

(4)  A  (t,8)  *  I  f  <0*4  -  log(  I  eSZj))  N.(t,  ds)  . 

n  i-lJ[0,t]  1  jeRft,s) 


Differentiating  (4)  with  respect  to  8  gives  the  score  process 


(5)  i  (t,8)  »  l  f 

"  i*l  [0, 


t] 


l 


z.  e 


81^  v 


_  jeR(t,s)  3 


8z- 

1  e  1 

jeR(t,s) 


>  Njft,  ds)  . 


The  maximum  partial  likelihood  estimator  of  8  is  the  solution  8  *  Bn(t) 
of 

in(t,B)  -  0  . 


Tests  of  the  hypothesis  HQ.  8-0Q  can  be  based  on  8  or  directly  on 
in(t,80)*  Th*  usual  Taylor  series  approximation 

(6)  0  -  ln(t,8)  -  in(t,B)  ♦  (0-8)  in(t,8)  +  ... 

indicates  that  the  asymptotic  behavior  of  8  is  intimately  associated 
with  that  of  in(t,8),  which  we  now  consider. 

Let  F  denote  the  class  of  events  at  time  t  and  age  s,  i.e. 

*t,s  i5  tne  by  I(yi<t).  >'tI{y1<t}'  'i'ty^t)" 

xiI(x1<(t-yi)*A>«1}.  i{Ci<(t.yi)*M/lIi). 

ciI{ci<(t-yi)+AsAxi)’  1-1 '  2*  •••»  n*  For  ®ftch  i-1»  2»  n*  bF  W’ 
(3),  and  the  conditional  independence  of  and  x^,  given  z^,  as  tt+0 
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^y”***-^  <-*.**- 


ECN^t,  s *L)  -  Ni(t.s)|Ft 


I{i«R(t,s)}  | 


P{ci>u|iiJ  P{xi  €  du(zj} 
Pic.>s|z.}  P{X,>slZj} 


(s,s+A]  i—  '  i/ 


♦  o(A)  , 


so  at  least  under  the  additional  assumption  of  continuity  of  the  condi> 
tional  distributions  of  given  z^, 

(7)  Et^Ct.s+Aj  -  N.(t,s)|FtjS)  -  I{ieR(t<s)}  (s+A)  -  A^s)}  ♦  o(A) 

where  A^  is  given  by  (1).  It  follows  from  (7)  that  for  any  fixed  t, 

(8)  {N.(t,s)  -  AjUt-y^AXjACjAs},  F^} 

is  a  martingale  in  s  (cf.  Gill,  1980,  p.  14  or  Lipster  and  Shiryayev, 
1978,  p  245). 

Let  A.(t,ds)  -  I{icR(t>8))  Ai(ds)  and 


i  (t,s,8)  -  if  |zi  "  ^ 
i»l  '  CO, si  I  1 


Bz. 

I  zj  *  3 

cRft.u)  J 


jeR(t,u) 


~ — /  (JLft.du)  -  AjCt.du)} 


It  follows  from  (5)  and  simple  algebra  that 


ln(t,t,0)  -  in(t,e) 


Moreover,  the  stochastic  integral  in  (9)  inherits  the  martingale  property 


of  (8),  so  for  each  fixed  t 


U„Ct.».S).  Tt'S) 


is  a  martingale  in  s  (Gill,  1980,  p.  10  or  Lipster  and  Shiryayev,  1978, 

p.  268). 

This  martingale  property  in  s  of  i-n(t,s,8)  has  been  the  fruitful 
starting  point  for  an  analysis  of  the  asymptotic  normality  of  in(t,g)  * 
in(t,t,8)  at  one  fixed  point  in  time  (cf.  Gill,  1980,  or  Andersen  and 
Gill,  1981).  However,  it  does  not  provide  useful  information  about  the 
joint  distribution  of  S-n(t,8)  at  different  values  of  t.  It  is  easy 
to  show  by  examples  that  for  general  entry  times  the  process  £n(t,0) 
is  not  a  martingale  in  t,  and  hence  it  is  necessary  to  uncover  some 
additional  structure  before  considering  central  limit  theorems. 

Let  N. (t)  =  N.  (t,t)  and  F  *  F  .  An  argument  similar  to  that 

X  X  L  L 

leading  to  (7)  shows  that 

(11)  E{N.(t+A)  -  N-(t)|Ft) 

=  'tieRCt.d-y.ni  Ui(t-V4)  '  *  ®<M  • 

Hence,  with  the  notation  Aj(dt)  *  (t-y.)*)}  ^(dt-yi),  we  see 

that 


(12) 


{^(t)  -  A^t),  Ft) 


is  a  martingale  in  t. 

Often  stochastic  integration  with  respect  to  the  martingale  in  (12) 
preserves  the  martingale  property  (cf.  Lipster  and  Shiryayev,  1978,  p.  268), 
More  precisely  for  our  purposes  we  have 

Lemma  1.  Assume  h^(s)  is  bounded,  Fs-neasurable,  and  left  continu¬ 
ous  in  s.  Then 


{  h,  (s)(N,  (ds)  -  A,(ds)},  F  } 

J[0,t]  1  1 


is  a  martingale  in  t. 

By  a  change  of  variable  in  (9) 


n  r 

(13)  i  (t,$)  =  l 

n  i=lJ(0,t] 


I 


8z 


z.  e 


U 


\  ■ 


jeR(t,u-y.)  3 


l 

jcR(t,u-yi) 


$z. 


{^(du)  -  Ai(du)} 


Although  (13)  is  a  stochastic  integral,  Lena  1  does  not  apply  because  the 
integrands  are  not  ^-measurable  and  they  depend  on  t.  However,  an 
informal  law  of  large  numbers  argument  suggests  that  these  integrands  are 
approximately 


/ 


zi 


Bzl 

E(zx  e  ;  XjACj  >  u-yt 
E(e  ;  XjACj  _>  u-y^ 


\ 


>1 


{y.<u} 
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which  does  not  depend  on  t;  and  according  to  Lemma  1 
'  8z, 


(14) 


n  r 

l  J 

i*l  J [0. 


t] 


'1 


<*i  - 


E(Zj  e  ;  Xj^ACj^u-y^ 


0Z1 

E(e  ;  x1ac1 


iu-yi)  J 


(N^(du)  -  A^(du)} 


n  f 

i  f  K- 

i-lJ[0,t] 


E(:j  e  ;  XjACj  s) 


$Z1 

E(e  ;  XjACj  s) 


UN^ds)  -  Ai(t,ds)} 


is  a  martingale  in  t. 

In  broad  outline  the  goals  of  the  rest  of  this  paper  are  to  show  that 
the  martingale  in  (14)  is  a  good  approximation  to  the  score  process  (13) 
uniformly  in  t  as  ir*«»  (Theorem  1),  and  to  apply  a  martingale  central 
limit  theorem  to  show  that  (14)  (and  hence  (13))  suitably  rescaled  behave 
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like  a  Brownian  aotion  process  asymptotically  as  n*»  (Theorem  2).  Finally 
the  asymptotic  behavior  of  I  is  related  to  that  of  &n(t,B)  via  (6)  and  a 
consistency  argument  (Theorem  3). 

Before  proceeding  with  the  technical  developments  to  follow,  we  make 
soae  remarks  related  to  the  asymptotic  rescaling  of  the  martingale  (14). 

In  order  that  (14)  behave  asymptotically  like  a  Brownian  notion  process 
it  is  important  that  its  "quadratic  variation"  or  its  "predictable  quad- 
ratic  variation"  grow  approximately  linearly  in  t.  (See  Meyer,  1976, 
p.  267  for  the  definition  of  these  terms  in  general  and  (33)  for  the 
special  case  of  (14) . )  For  (14)  this  linear  growth  does  not  occur  in 
the  time  scale  t,  and  it  is  convenient  to  introduce  a  data  dependent 
transformation  of  time  to  obtain  the  desired  linearity.  From  a  statistical 
point  of  view  the  natural  mechanism  to  effect  this  change  of  time  is  the 
observed  Fisher  information  or  minus  the  second  derivative  of  the  log 
(partial)  likelihood,  -in(t,B),  which  will  be  shown  to  be  essentially  the 
same  as  the  quadratic  variation  of  (14).  Hence,  for  v  >  0  let 

(15)  tn(v,B)  »  infCt:  -&n(t,B)  >_  vn}  . 

Theorem  2  of  Section  4  asserts  that 

(16)  n'%  in(Tn(v,B),  B)  *  W(v)  , 

where  W  is  a  standard  Brownian  motion.  Of  course,  in  practice  one  must 
actually  use  the  observable  quantity  (v,B)  to  define  the  time  scale. 

See  Gramibsch  (1982)  or  Lai  and  Siegmund  (1982)  for  discussions  of  the  use 
of  Fisher  information  as  a  means  of  rescaling  time. 

It  is  worth  noting  that  much  of  the  preceding  discussion  generalizes 
immediately  to  several  dimensions.  However,  the  rescaling  of  time  indicated 
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* 


by  (15)  and  (16)  carries  over  directly  only  if  all  elements  of  the  matrix 
•• 

-An(t,B)  have  the  same  growth  rates  in  t  for  large  n.  Except  when 
8-0  this  is  typically  not  the  case. 

We  now  turn  to  a  detailed  analysis  of  the  approximation  of  (5)  by 
(14).  A  more  thorough  discussion  of  (16)  is  contained  in  Section  4. 


3.  Approximation  of  in(t,8)  by  the  Martingale  (14). 

Theorem  1.  Let  Rn(t)  denote  the  difference  between  the  martingale 
in  (14)  and  S-n(t,8)  given  by  (5)  or  equivalently  by  (9)  with  s=t.  Then 
for  arbitrary  e  >  0,  as  n-**° 


P{  sup  ] R  (t)  j  >  en*}  -*■  0  . 

0<t<«® 

The  proof  of  Theorem  1  is  a  consequence  of  the  following  three  lemmas. 
It  will  be  convenient  to  use  the  notation: 


Bz. 

I  zj  e  3 

u(t.s)  «  -■ 

I  e  j 

jeR(t,s) 


(s  <  t) 


ECZj  e  ;  XjACj  >  s) 


E(e  ;  x^ACj  >_  s) 


In  (17)  and  (18)  we  interpret  0/0  as  0.  With  this  notation 

(19)  R  (t)  *  l  f  {y(t,s)-u(s)){N.  (t,ds)  -  A.(t,ds)}  . 
"  i'tO.t]  1  1 
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Lemma  2.  Uniformly  in  0  <  t  <  ® 


E  R2(t)  =  0(log  n)  . 

Proof.  From  fundamental  properties  of  stochastic  integrals 

E  R2(t)  =  E(£  f  (il(t,s)-vi(s)}2  N.  (t,ds))  . 

"  iJ[0,t] 

By  considering  the  i**1  term  and  conditioning  on  x. ,  R(t,x^),  and  the 
event  {x^  £  c^A(t-y^)+),  we  obtain 


(20)  E  R^(t)  =  E(£  N^t.Xj)  E[{p(t,xi)-y(xi)>  |xi>  R(t,xi),  Xj  <  (t-y^  ac^) 


N(t,x.) 

<  const.  EQ, - ~~  E[{yn(x.)  \  z  e 

i  |R(t,x.)r  jeR(t,x.)  3 


Bz  2 


-  Vj (x±D  l  e  3}  |xi,  RCt.x^,  <  c.A(t-y.)]) 


jcRft.x.) 


v  _Bzl, 


where  yv(s)  *  E(Zj  e  IxjA^  £  s)  for  v=0  and  1,  and  |A|  denotes  the 

cardinality  of  the  set  A.  Let  R^(t,s)  =  R(t,s)  -  (i),  and  observe  that 
given  xt,  x.^  £  c^t-yj* ,  and  R^t.Xj)  *  (jj,  ...,  jr>,  z^,  ...»  z..  are 
independent  and  identically  distributed  with 


j 

E(zV  e  A|xif  Ri(t,xi),  xi  <  ciA(t-yi)+)  »  y^Xj) 


for  v»0  or  1.  Hence  except  for  the  terms  involving  i,  the  conditional 
expectations  on  the  right  hand  side  of  (20)  involve  the  square  of  a  sum  of 
i.i.d.  random  variables  having  mean  0.  Hence 
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2  N,  (t,x41 

E  R*(t)  <  const . {E (£ 


Njft.x.) 

- - “j)} 

iRft.Xj)!2 


*  E[0(log  l  N. (t,t)}J  *  0(log  n) 
i  1 


uniformly  in  t. 

Lemma  3.  Let  0  <  e  <  1/10  and  0  *  tQ  <  tj  <  ...  <  t  j  ^ 

Then  P{  max  |R_(t.)|  >  n4^1"^}  ■  0(n"2e  log  n). 
l-te  n  K 

l<k<n 

The  proof  of  Leona  3  is  an  immediate  consequence  of  Lemma  2  and  Chebyshev's 
inequality.  It  remains  to  make  a  specific  choice  of  the  points  (t^)  and 
show  that  Rn (t)  cannot  change  too  rapidly  between  these  points. 

Let  ^  a  I  |[t  t  )  ^(t^.ds-y.)  denote  the  number  of  deaths  observed 
during  Ctk_1 , and  let 


W 


ds-y.) 


denote  the  associated  accumulated  hazard.  We  choose  the  partition  {t^,  k«0, 
1,  ...,  n1"36}  so  that  for  all  k 


(21)  E  Dk  <  2n3e  . 

Lemma  3  above  and  Lemma  4  below  complete  the  proof  of  Theorem  1. 

Lemma  4.  P{max  max  |R  (t)-R  (t.  .)|  >  0  as  n-*®. 

k 

Proof.  Note  that  if  B(p)  is  a  Bernoulli  variable  with  parameter  p, 
then  for  p  jc  B(p)  is  stochastically  smaller  than  a  Poisson  variable 
with  parameter  2p,  say  ?  (2p) .  Hence  for  all  0  <  p  <  1,  B(p)  is  stochas 
tically  smaller  than  2p  4-  p(2p).  Since  is  a  sum  of  independent 
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Bernoulli  variables,  it  follows  from  (21)  that  Dk  is  stochastically 
3e  3e 

saaller  than  4n  +  ? (4n  ).  Hence  by  easy  large  deviation  estimates 

(221  P{max  D.  >  n7c/ 2)  +  0  (ir*»)  . 

k  K 

on  {H,  i  n  J  Dk  is  stochastically  larger  than  a  Poisson  random  variable 
of  mean  n4e  and  hence  P{D^  £  n^E^,  n4e)  £  P{P(n4e)  £  n^E^)  *  0(n”*). 
From  this  and  (22)  it  follows  that 

(23)  P(max  H.  >  n4e)  -*•  0  (rr*»)  . 

k  K 


Let  tk  l  £  t  <  tfc.  By  (19) 


(24)  R  (t)-R  (t.  .)  -  f  (y (t,s)-y(s)){N(t,ds)-A(t,ds)} 

^Vl**! 

♦  j  {y(t,s)-y(s)}{N(t,ds)-A(t,ds)-N(tk_1,ds)*A(tk_1,ds)} 

l0,tk-l* 

♦  |  {y(t,s)-w(tk_1,s)){N(tk_1,ds)-A(tk_1,ds)}  , 

l0,tk-lJ 


where  we  have  used  the  notation 


N(t,ds)  ■  l  N^t.ds),  A(t,ds)  ■  ^  A^t.ds)  . 

By  assumption  the  z^  are  bounded  and  hence  y(t,s)  and  y(s)  are  bounded. 
Since  N(t,ds)  -  N(tk_j,ds)  and  A(t,ds)  -  A(tk_j,ds)  are  both  positive 
and  increasing  in  t,  it  follows  that  the  first  two  terms  on  the  right 
hand  side  of  (24)  are  of  order  uniformly  in  tk  l  £  t  <  tk.  Hence 

by  (22)  and  (23)  it  suffices  to  consider  the  third  integral  in  (24).  Let 


12 


«(t,s)  -  I 

j«R(t.s) 


and  observe  that 


(26)  A(t,ds)  -  m(t,s)  X(s)ds  . 

Letting  B/2  denote  a  bound  on  the  z^,  we  find  from  (17)  and  soae  algebra 
that  uniformly  in  tjc  j  £  1  <  tjc 

(27)  |y(t,s)-y(tkl,s)|  <  B{a(tk,s)-a(tkl,s))/«(tk,s)  . 

Hence  by  (24)  and  (27)  it  suffices  to  show 

(28)  P{aax  J  [ {»(tR , s) -«(tk_ j , s) } /■  (tfc , s) ]  N(tk_j,ds)  >  n**(1'e)}  - 

R  [0# tk_ j) 


(29)  P(max  J  ({■(tk,s)-a(tk.1,s)}/a(tk,s)]  A(tkl,ds)  >  n^1'0}  + 

k  l0,tk-P 

From  (26)  and  soae  algebra  we  see  that  the  random  variable  in  (29)  is  major 

ized  by  max  H. ,  so  (29)  follows  from  (23). 
k  * 

Now  consider  (28).  It  is  easily  seen  by  direct  calculation  that 
Ljj(s)  «  |  ^  [{n(tk,u)  -  ■(tk.1»u)}/m(tk,u)l  N(tk  l>du) 

-  (N^.s)  -  N(tk_lts)} 

is  a  supermart ingale  for  0  s  <  tkl,  which  changes  by  jumps  downward  of 
size  1  and  upward  of  size  at  most  equal  to  1.  Furthermore,  N(t|c»t|c_i) 

-  N(tt  i ,tvi)  <  Dv,  so  by  (22),  to  prove  (28)  it  suffices  to  show 


'•■Tr;^  ■#»•«»  «H*W* 


(30)  P{max  L^t^)  >  n46}  ■*•  0  . 

Let  S0  ■  0,  and  for  j*l,  2,  ...  let 

Sj  »  inf{s:  s  Sj_i»  ^(s)  -  l^CSj.^)  £  1  or  K 

where  it  is  understood  that  inf  <p  •  t^_j.  Obviously  -1  £  ^(Sj)  -  ^(Sj.j) 
<_  2,  and  from  the  supermartingale  property  we  see  that  on  (Sj_j  <  t^_j) 

E{LktsJ)  -  LktSJ-l)|Ttk,Sj.1)  -°  ’ 

and  hence 

(31)  P(Sj  <  Vl,  I^CSj)  -  I^CSj.j)  >  ll*^}  <  1/2  • 

It  follows  from  (31)  that  between  downward  jumps  the  total  increase  of 
L^(s)  is  stochastically  smaller  than  l*y,  where  P(y»m}  ■  (1/2)  , 

m-0,  1,  ...  .  Since  the  total  number  of  downward  jumps  is  D^,  an  easy 
large  deviation  estimate  gives 

P{Lk(tk-P  -  n4C’  °k  -  "7E/2}  *  o(n  l)  * 

Hence  by  (22) 

P(max  L^U^j)  >  n4e)  <  P{max  >  n?E^2}  ♦  l  ^Hc^k-P  —  n46*  ®k  <  n?C^ 
k  k  k 

♦  0  as  n-*». 

4.  Convergence  to  Brownian  Motion 

In  this  section  we  give  a  precise  interpretation  of  (16)  and  indicate 
its  proof.  Let  (^(t.g)  denote  the  martingale  in  (14),  which  in  the  nota¬ 
tion  of  Section  3  can  be  written 
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(32)  (^(t.B)  •  J  |  ]  fXi  "  W(*”yi)J  *Hi(ds)  '  Ai(d,>} 

»  l  [  (t,  -  W(s)}  (N,(t,ds)  -  A.(t,ds)}  . 

i  MO.t]  1 

It  follows  froa  the  first  representation  of  in  (32)  and  the  independence  of  the 
different  terns  that  the  predictable  quadratic  variation  of  the  Martingale  is 


(33) 


l  f  {z.-y(s-y.)}2A,(ds)  -  l  [  (x1-W(s)}2  A,(t,ds)  . 

i  *  10, t]  1  1  1  i  j[0.t]  1  1 


Let  vf 
numbers 


8z.fX.AC,  ^ 

E[e  {z.-y(s)}  A(s)ds]  and  note  that  by  the  law  of  large 

■’0  1 


n'1  l  [  {z.-y(s)}2  A.  (*,ds)  -*•  v- 

i  J {0, •) 

in  probability.  Hence  for  0  <  v  <  v{  and  Tn(v,B)  defined  by 

T  (v.B)  •  inf(t:  n"1  £  [  (z.-y(s)}2  A.(t,ds)  >  v) 

n  iMO.t]  1 

we  have  P{T  (v,3)  <»}-*■  1  and 
n 

(34)  n‘l£  [  (z.-y(s)}2  A. (t,ds)  ♦  v 

i  JlO,Tn(v,B)]  1  1 

in  probability.  It  follows  fron  (34)  and  the  fora  of  the  nartingale  central 
Unit  theorem  given  by  Rebolledo  (1980,  p.  273,  Proposition  1)  that  for 
every  0  <  v  < 

(35)  n"%Qn(Tn(*,8).  8)  *  W(*) 

on  [0,v]  as  n-"*,  where  W  is  a  standard  Brownian  notion. 

15 


This  result  is  unsatisfactory  for  statistical  purposes  because  (33) 
is  not  an  observable  randoa  variable  -  even  under  a  simple  null  hypothesis 
when  0  is  assumed  to  be  known.  Consider  now 

(36)  -En(t,B)  -  [  S2(t,s)  N(t,ds)  . 

n  J[0,t] 

where 

o  ,  &Zi 

o  (t,s)  »  l  (z,  -  p(t,s))z  e  3/  I  e  3  , 
jcR(t,s)  3  jcR(t,s) 

and  let  rn(v,0)  be  defined  by  (15). 

Theorem  2.  For  each  0  <  v  <  v^., 

P{Tn(v,B)  <  “)  -*•  1 

and  on  [0,v] 

(37)  n**  in(Tn(%0),  0)  *  *(•)  . 

where  W(»)  is  standard  Brownian  motion. 

The  key  tools  in  the  proof  of  Theorem  2  are  Theorem  1,  which  shows 
that  it  suffices  to  prove  (37)  with  in  place  of  An#  and  Lemma  5 
below,  which  shows  that  (34)holds  with  Tn  in  place  of  Tn,  so  the  mar¬ 
tingale  central  limit  theorem  applies  to  yield  (35)  with  in  place 

of  t  . 
n 

Le— a  5.  For  saall  positive  e,  as  rt«» 

(38)  P(max|X  (t,0)  ♦  l  f  {z.-p(s))2  A.(t,ds)|>  n1'6)  -  0  . 

t  n  i  J[0,t)  1  1 

Proof.  The  proof  is  similar  to  Theorem  1,  so  we  give  only  a  general 


outline.  Observe  that 


(39) 


iL(t.e)  ♦  Z  f  {z.-y(s)}2A.  (t,ds) 
n  i  JIO,t]  1  1 

f  (y(t,s)-y(s))2A(t,ds)-f  o2(s){N(t,ds)-A(t,ds)} 

f  {o2(t,s)-c2(s)){N(t,ds)-A(t,d$)}  , 

J[0,t) 


where 

2  2 

a  (s)  *  E[{Zj-y(s)}  e  ;  Xj  ACj  >^sl/E(e  ;  XjACj^s)  . 

Each  of  the  terns  on  the  right  hand  side  of  (39)  can  be  estiaated  by  techniques 

similar  to  the  proof  of  Theorem  1.  For  example,  the  third  integral  can  be 

split  into  a  part  involving  a  difference  of  second  moments  and  a  part  involving 

a  difference  of  squares  of  first  moments.  The  second  moment  piece  is  treated 

directly  by  the  techniques  of  Theorem  1;  for  the  first  moment  piece  we  use 
2  2 

a  -  b  «  (a+b) (a-b)  and  the  boundedness  of  (a+b)  in  order  to  apply  the 
techniques  of  Theorem  1. 


S.  Discussion 

In  order  to  turn  Theorem  2  into  a  statistically  interesting  result, 
one  must  (a)  relate  the  behavior  of  the  partial  likelihood  function  given 
in  (37)  to  that  of  the  maximum  partial  likelihood  estimator  B  defined 
by  in(t,S)  »  0,  and  (b)  replace  0  by  I  in  (IS),  so  that  the  desired 
renormalization  of  time  is  accomplished  by  observable  random  variables. 
This  yields  the  main  result  of  the  paper. 
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Theorem  3.  Let  0  <  v  <  and 

Tn(v)  ■  inf ft:  -Xn{t,  0n(t)>  >  vn]  . 

* 

For  0  <  v#  <  v  <  vf  as  n-*» 

(40)  (O^ll^T^*)  -  3)  i  W(*) 
or  equivalently 

(41)  -n^  X„[Tn(*),  Bn(xn(-)>1  (Sn{Tn(*)>  -  B]  -  W(-) 
on  lv*,  v*]. 

We  have  emitted  the  proof  of  this  theorem.  Basically  one  combines 
the  results  of  Sections  3  and  4  with  some  Taylor  series  expansions  to 
prove  the  consistency  of  6,  and  then  uses  this  consistency,  (6),  and 
Theorem  2  to  prove  Theorem  3. 

A  consequence  of  Theorem  3  is  that  in  the  time  scale  determined  by 
rn>  8n  can  be  approximated  by  a  Brownian  motion  process  for  the  purpose 
of  sequentially  testing  statistical  hypotheses  about  3  or  for  estimating 
3.  Thus  if  we  would  be  satisfied  with  a  repeated  significance  test  or  a 
truncated  sequential  probability  ratio  test  for  testing  a  hypothesis  about 
the  drift  of  a  Brownian  motion  process,  we  can  obtain  an  analogous  asymp¬ 
totic  test  for  3. 

L 

Since  n  W(v)  and  W(nv) ,  0  _<  v  £  have  the  same  joint  distribu¬ 
tions,  it  is  tempting  to  rewrite  (40)  as 

(42)  *»vIBn{tn(v)}  -  31  -  W(nv)  for  nv*  <  nv  <  nv* 

and  treat  nv  *  u  as  one  variable.  A  loose  interpretation  of  (42)  would 
bo  that 
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(43) 


-in(t,  $n(t)}  (in(t)  -  b> 

behaves  approximately  like 

(44)  wr-in(t,  Sn(t)}) 

provided  -An  ■  -in(t,  §n (t) }  is  "large".  Of  course,  the  theorem  speci¬ 
fies  that  "large"  means  proportional  to  n  with  constants  of  proportion¬ 
ality  bounded  away  from  0  and  from  v^. 

In  practice  it  is  probably  unnecessary  to  interpret  the  minimum 
information  requirement  stringently.  In  fact,  close  scrutiny  of  the  proof 

If 

of  Theorem  2  shows  that  n  in  (15)  could  be  replaced  by  n  for  suit- 

.L 

able  small  positive  5  and  then  the  normalizing  n  ^  in  (37)  would  become 
n~Js(l-5) .  Hence  the  approximation  of  (43)  by  (44)  is  valid  for  values  of 
-1  of  smaller  order  of  magnitude  than  n,  but  we  do  not  know  how  much 
smaller.  We  conjecture  that  with  a  proper  reformulation  it  is  possible 
to  give  an  interpretation  to  the  approximation  of  (43)  by  (44)  provided 
only  that  -A  is  large. 

•• 

The  maximum  information  requirement  that  -in  «  nv^  is  probably 
more  important  and  could  conceivably  cause  some  difficulty  in  practice, 
since  v^  is  essentially  never  known.  However,  if  the  patient  arrival 
rate  is  sufficiently  great  and  the  experimental  period  comparatively  short 
so  that  some  reasonable  percentage  of  the  total  number  put  on  test  is  still 
alive  at  the  end  of  the  experiment,  no  problems  should  arise. 

It  seems  desirable  to  conduct  a  Monte  Carlo  experiment  to  get  some 
feeling  for  the  practical  limitations  of  Theorem  3.  For  the  related  prob¬ 
lem  of  testing  the  null  hypothesis  8»0,  Gail,  DeMets,  and  Slud  (1981) 
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conclude  that  the  score  statistic  under  the  null  hypothesis  is  reasonably 
approximated  by  a  Brownian  motion.  Their  time  renormalization  is  not 
appropriate  for  general  3,  however. 

Slud's  (1982)  theoretical  approach  is  superficially  similar  to  ours 
in  that  he  introduces  a  martingale  to  approximate  the  score  process  of 
the  partial  likelihood.  His  martingale  is  different  from  ours  (although 
it  is  a  special  case  of  the  class  of  martingales  described  by  Lemaa  1). 

He  considers  only  the  null  hypothesis  8=0  and  uses  a  time  renormalization 
which  would  be  inappropriate  for  general  8.  Also  what  corresponds  to  our 
Lemma  4  is  essentially  his  assumption  A. 5.  This  assumption  is  never 
actually  verified  although  Slud  states  that  it  can  be  verified  under  vari¬ 
ous  sets  of  conditions,  all  of  which  require  strong  hypotheses  on  the 
arrival  process. 

It  is  not  obvious  how  one  should  generalize  these  results  to  multi¬ 
dimensional  8.  Except  when  8-0,  one  cannot  expect  that  the  information 
about  the  various  coordinates  of  8  accumulate  at  the  same  rate,  and  hence 
one  cannot  generalize  (IS)  directly.  In  the  case  where  one  coordinate  of 
8  is  a  treatment  indicator,  it  seems  possible  to  study  this  one  coordinate 
sequentially  by  making  a  time  change  in  terms  of  the  residual  variance  of 
its  regression  on  the  other  coordinates.  We  hope  to  discuss  this  problem 
in  a  future  publication. 


'  1r-»  >  A  r<;  np*<# 
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